AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.59)

Neural Information Processing SystemsAug-20-2025, 07:44:49 GMT

Outlier Detection and Robust PCA Using a Convex Measure of Innovation

Mostafa Rahmani, Ping Li

Neural Information Processing Systems http://nips.cc/

inlier, isearch, outlier, (15 more...)

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(6 more...)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Neural Information Processing SystemsAug-18-2025, 18:41:54 GMT

c4b0ffe9946b3a45063ac158b3cd2eff-Paper-Conference.pdf

artificial intelligence, data mining, machine learning, (18 more...)

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Data Science > Data Mining (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.49)

arXiv.org Machine LearningJul-1-2025

Global Convergence of Iteratively Reweighted Least Squares for Robust Subspace Recovery

Lerman, Gilad, Li, Kang, Maunu, Tyler, Zhang, Teng

Robust subspace estimation is fundamental to many machine learning and data analysis tasks. Iteratively Reweighted Least Squares (IRLS) is an elegant and empirically effective approach to this problem, yet its theoretical properties remain poorly understood. This paper establishes that, under deterministic conditions, a variant of IRLS with dynamic smoothing regularization converges linearly to the underlying subspace from any initialization. We extend these guarantees to affine subspace estimation, a setting that lacks prior recovery theory. Additionally, we illustrate the practical benefits of IRLS through an application to low-dimensional neural network training. Our results provide the first global convergence guarantees for IRLS in robust subspace recovery and, more broadly, for nonconvex IRLS on a Riemannian manifold.

artificial intelligence, dist, machine learning, (18 more...)

2506.20533

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Minnesota (0.04)
(4 more...)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.87)

Han, Sangil, Kim, Kyoowon, Jung, Sungkyu

Subspace Recovery in Winsorized PCA: Insights into Accuracy and Robustness

arXiv.org Machine LearningFeb-22-2025

In this paper, we explore the theoretical properties of subspace recovery using Winsorized Principal Component Analysis (WPCA), utilizing a common data transformation technique that caps extreme values to mitigate the impact of outliers. Despite the widespread use of winsorization in various tasks of multivariate analysis, its theoretical properties, particularly for subspace recovery, have received limited attention. We provide a detailed analysis of the accuracy of WPCA, showing that increasing the number of samples while decreasing the proportion of outliers guarantees the consistency of the sample subspaces from WPCA with respect to the true population subspace. Furthermore, we establish perturbation bounds that ensure the WPCA subspace obtained from contaminated data remains close to the subspace recovered from pure data. Additionally, we extend the classical notion of breakdown points to subspace-valued statistics and derive lower bounds for the breakdown points of WPCA. Our analysis demonstrates that WPCA exhibits strong robustness to outliers while maintaining consistency under mild assumptions. A toy example is provided to numerically illustrate the behavior of the upper bounds for perturbation bounds and breakdown points, emphasizing winsorization's utility in subspace recovery.

breakdown point, outlier, subspace, (14 more...)

2502.16391

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report (0.81)

Industry:

Health & Medicine (0.46)
Government (0.46)

Technology:

Information Technology > Data Science (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.86)

Szwagier, Tom, Pennec, Xavier

Nested subspace learning with flags

arXiv.org Machine LearningFeb-9-2025

Many machine learning methods look for low-dimensional representations of the data. The underlying subspace can be estimated by first choosing a dimension $q$ and then optimizing a certain objective function over the space of $q$-dimensional subspaces (the Grassmannian). Trying different $q$ yields in general non-nested subspaces, which raises an important issue of consistency between the data representations. In this paper, we propose a simple trick to enforce nestedness in subspace learning methods. It consists in lifting Grassmannian optimization problems to flag manifolds (the space of nested subspaces of increasing dimension) via nested projectors. We apply the flag trick to several classical machine learning methods and show that it successfully addresses the nestedness issue.

artificial intelligence, machine learning, optimization problem, (17 more...)

2502.06022

Country:

Europe > France > Provence-Alpes-Côte d'Azur (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Wisconsin (0.04)
Asia > Japan > Honshū > Tōhoku (0.04)

Genre: Research Report (0.50)

Industry:

Education (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Neural Information Processing SystemsOct-10-2024, 09:40:57 GMT

Subspace Recovery from Heterogeneous Data with Non-isotropic Noise

Recovering linear subspaces from data is a fundamental and important task in statistics and machine learning. Motivated by heterogeneity in Federated Learning settings, we study a basic formulation of this problem: the principal component analysis (PCA), with a focus on dealing with irregular noise. Our data come from n users with user i contributing data samples from a d -dimensional distribution with mean \mu_i . Our goal is to recover the linear subspace shared by \mu_1,\ldots,\mu_n using the data points from all users, where every data point from user i is formed by adding an independent mean-zero noise vector to \mu_i . If we only have one data point from every user, subspace recovery is information-theoretically impossible when the covariance matrices of the noise vectors can be non-spherical, necessitating additional restrictive assumptions in previous work.

heterogeneous data, non-isotropic noise, subspace recovery, (7 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.40)

arXiv.org Machine LearningApr-12-2024

Theoretical Guarantees for the Subspace-Constrained Tyler's Estimator

Lerman, Gilad, Yu, Feng, Zhang, Teng

This work analyzes the subspace-constrained Tyler's estimator (STE) [12] designed for recovering a low-dimensional subspace within a dataset that may be highly corrupted with outliers. It assumes a weak inlier-outlier model and allows the fraction of inliers to be smaller than a fraction that leads to computational hardness of the robust subspace recovery problem. It shows that in this setting, if the initialization of STE, which is an iterative algorithm, satisfies a certain condition, then STE can effectively recover the underlying subspace. It further shows that under the generalized haystack model, STE initialized by the Tyler's M-estimator (TME), can recover the subspace when the fraction of iniliers is too small for TME to handle.

inequality follow, matrix, tme, (15 more...)